Identifying representative trees from ensembles.
نویسندگان
چکیده
Tree-based methods have become popular for analyzing complex data structures where the primary goal is risk stratification of patients. Ensemble techniques improve the accuracy in prediction and address the instability in a single tree by growing an ensemble of trees and aggregating. However, in the process, individual trees get lost. In this paper, we propose a methodology for identifying the most representative trees in an ensemble on the basis of several tree distance metrics. Although our focus is on binary outcomes, the methods are applicable to censored data as well. For any two trees, the distance metrics are chosen to (1) measure similarity of the covariates used to split the trees; (2) reflect similar clustering of patients in the terminal nodes of the trees; and (3) measure similarity in predictions from the two trees. Whereas the latter focuses on prediction, the first two metrics focus on the architectural similarity between two trees. The most representative trees in the ensemble are chosen on the basis of the average distance between a tree and all other trees in the ensemble. Out-of-bag estimate of error rate is obtained using neighborhoods of representative trees. Simulations and data examples show gains in predictive accuracy when averaging over such neighborhoods. We illustrate our methods using a dataset of kidney cancer treatment receipt (binary outcome) and a second dataset of breast cancer survival (censored outcome).
منابع مشابه
Ensemble-Roller: Planning with Ensemble of Relational Decision Trees
In this paper we describe the ENSEMBLE-ROLLER planner submitted to the Learning Track of the International Planning Competition (IPC). The planner uses ensembles of relational classifiers to generate robust planning policies. As in other applications of machine learning, the idea of the ensembles of classifiers consists of providing accuracy for particular scenarios and diversity to cover a wid...
متن کاملSelected Prior Research
• 1996 scaled tree-based classifiers to very large data sets. A fundamental challenge in data mining is to mine data sets that are so large that they do not fit into a computer’s memory. This is important for a wide variety of applications ranging from homeland defense to identifying fraudulent credit card transactions. One of the most accurate techniques in data mining is tree-based classifier...
متن کاملEnsembles of Multi-Objective Decision Trees
Ensemble methods are able to improve the predictive performance of many base classifiers. Up till now, they have been applied to classifiers that predict a single target attribute. Given the non-trivial interactions that may occur among the different targets in multi-objective prediction tasks, it is unclear whether ensemble methods also improve the performance in this setting. In this paper, w...
متن کاملA Generic Approach for Image Classification Based on Decision Tree Ensembles and Local Sub-windows
A novel and generic approach for image classification is presented. The method operates directly on pixel values and does not require feature extraction. It combines a simple local sub-window extraction technique with induction of ensembles of extremely randomized decision trees. We report results on four well known and publicly available datasets corresponding to representative applications of...
متن کاملIs it worth generating rules from neural network ensembles?
Although many authors generated comprehensible models from individual networks, much less work has been done in the explanation of ensembles. DIMLP is a special neural network model from which rules are generated at the level of a single network and also at the level of an ensemble of networks. We applied ensembles of 25 DIMLP networks to several datasets of the public domain and a classificati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Statistics in medicine
دوره 31 15 شماره
صفحات -
تاریخ انتشار 2012